Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

نویسندگان

  • Huimin Zhou
  • Sudha Ram
چکیده

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clustering techniques, including K-means, hierarchical clustering, and self-organizing map (SOM) neural network, to identify similar schema elements from heterogeneous data sources, based on a combination of features such as naming similarity, document similarity, schema specification, data patterns, and usage patterns. An SOM prototype the authors have developed provides users with a visualization tool for display of clustering results as well as for incremental evaluation of candidate similar elements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Semantic Schema Matching Approach

Schema matching is a critical step in many applications, such as data warehouse loading, Online Analytical Process (OLAP), Data mining, semantic web [2] and schema integration. This task is defined for finding the semantic correspondences between elements of two schemas. Recently, schema matching has found considerable interest in both research and practice. In this paper, we present a new impr...

متن کامل

Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple cluster...

متن کامل

Exploiting Schema Knowledge for the Integration of Heterogeneous Sorces

Information sharing from multiple heterogeneous sources is a challenging issue which ranges from database to ontology areas. In this paper, we propose an intelligent approach to information integration which takes into account semantic connicts and contradictions, caused by the lack of a common shared ontology. Our goal is to provide an integrated access to information sources, allowing a user ...

متن کامل

Exploiting Schema Knowledge for the Integration ofHeterogeneous Sources

Information sharing from multiple heterogeneous sources is a challenging issue which ranges from database to ontology areas. In this paper, we propose an intelligent approach to information integration which takes into account of semantic connicts and contradictions, caused by the lack of a common shared ontology. Our goal is to provide an integrated access to information sources, allowing a us...

متن کامل

An ontology-based approach for resolving semantic schema conflicts in the extraction and integration of query-based information from heterogeneous web data sources

There are many external resources and heterogeneous data on the internet that an organization or user may need to improve the decision making process. It is therefore, very important and critical that this information are complete, precise and can be acquired on time. Most web sources provide data in semi-structured form on the internet. The combination of semi-structured data from different so...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Database Manag.

دوره 15  شماره 

صفحات  -

تاریخ انتشار 2004